2025/7/18

Context Engineering for AI Agents

Yichao 'Peak' Ji

Lessons from building Manus - an AI agent framework that leverages in-context learning of frontier models

"Context engineering is still an emerging science—but for agent systems, it's already essential."

The Choice: End-to-End vs In-Context Learning

At the beginning of the Manus project, we faced a key decision:

Our choice: Bet on context engineering

Why? Ship improvements in hours instead of weeks, and keep our product orthogonal to underlying models

We call our manual process of architecture searching, prompt fiddling, and empirical guesswork "Stochastic Graduate Descent"

Design Around the KV-Cache

The single most important metric for a production-stage AI agent: KV-cache hit rate

Why it matters:

KV-Cache illustration

Cached input tokens cost 0.30 USD/MTok vs 3 USD/MTok for uncached - a 10x difference!

Improving KV-Cache Hit Rate

Key Practices:

Common mistake: Including a timestamp at the beginning of the system prompt kills your cache hit rate

If self-hosting models using frameworks like vLLM, ensure prefix/prompt caching is enabled

Mask, Don't Remove

As agents gain more capabilities, their action space grows more complex

Dynamic action spaces (like RAG-based tool loading) cause problems:

Masking illustration

Manus uses a context-aware state machine to manage tool availability by masking token logits during decoding

Function Calling Modes

We constrain action selection by masking token logits directly:

Auto

Model may choose to call a function or not

Prefixed with: <|im_start|>assistant

Required

Model must call a function, unconstrained

Prefixed with: <|im_start|>assistant<tool_call>

Specified

Model must call a function from a specific subset

Prefixed with: <|im_start|>assistant<tool_call>{"name": "browser_

Consistent tool prefixes (browser_, shell_) allow easy enforcement of tool groups

Use the File System as Context

Modern LLMs offer 128K+ token contexts, but in agentic scenarios:

File system as context

Manus treats the file system as ultimate context: unlimited, persistent, and directly operable by the agent

Compression strategies are always designed to be restorable

Manipulate Attention Through Recitation

Manus creates and updates todo.md files during complex tasks

Why this matters:

Todo list example

By rewriting the todo list, Manus recites objectives into context end, pushing global plan into recent attention span

This avoids "lost-in-the-middle" issues and reduces goal misalignment

Keep the Wrong Stuff In

Agents make mistakes - that's reality, not a bug

Common impulse: Hide errors, clean up traces, retry actions

Problem: Erasing failure removes evidence

Error handling illustration

Most effective improvement: Leave wrong turns in context

When model sees failed action + observation, it implicitly updates internal beliefs

Error recovery is one of the clearest indicators of true agentic behavior

Don't Get Few-Shotted

Few-shot prompting can backfire in agent systems

Language models are excellent mimics - they imitate patterns in context

Danger in tasks with repetitive decisions:

Few-shot example

Fix: Increase diversity with structured variation in actions and observations

The more uniform your context, the more brittle your agent becomes

Conclusion

Context engineering is still an emerging science—but for agent systems, it's already essential

How you shape the context ultimately defines how your agent behaves:

"The agentic future will be built one context at a time. Engineer them well."

These patterns worked for us after repeated rewrites, dead ends, and real-world testing across millions of users